CS – 2002 – 03 A Large , Fast Instruction Window for Tolerating Cache

نویسندگان

Tong Li

Jinson Koppanalil

Alvin R. Lebeck

Jaidev Patwardhan

Eric Rotenberg

چکیده

Instruction window size is an important design parameter for many modern processors. Large instruction windows offer the potential advantage of exposing large amounts of instruction level parallelism. Unfortunately, naively scaling conventional window designs can significantly degrade clock cycle time, undermining the benefits of increased parallelism. This paper presents a new instruction window design targeted at achieving the latency tolerance of large windows with the clock cycle time of small windows. The key observation is that instructions dependent on a long latency operation (e.g., cache miss) cannot execute until that source operation completes. These instructions are moved out of the conventional, small, issue queue to a much larger waiting instruction buffer (WIB). When the long latency operation completes, the instructions are reinserted into the issue queue. In this paper, we focus specifically on load cache misses and their dependent instructions. Simulations reveal that, for an 8-way processor, a 2K-entry WIB with a 32-entry issue queue can achieve speedups of 20%, 84%, and 50% over a conventional 32-entry issue queue for a subset of the SPEC CINT2000, SPEC CFP2000, and Olden benchmarks, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Checkpoint Processing and Recovery: An Efficient, Scalable Alternative to Reorder Buffers

0272-1732/03/$17.00  2003 IEEE Published by the IEEE computer Society Achieving high performance in modern microprocessors requires a combination of exposing large amounts of instruction level parallelism (ILP) and processing instructions at a high clock frequency. Exposing maximum ILP requires the processor to operate concurrently on large numbers of instructions, also known as the instructio...

متن کامل

Practical Precise Evaluation of Cache Effects on Low Level Embedded Vliw Computing

The introduction of caches inside high performance processors provides technical ways to reduce the memory gap by tolerating longmemory access delays. While such intermediate fast caches accelerate program execution in general, they have a negative impact on the predictability of program performances. This lack of performance stability is a non-desirable characteristic for embedded computing. W...

متن کامل

Microarchitecture for Billion-Transistor VLSI Superscalar Processors

Microarchitecture for Billion-Transistor VLSI Superscalar Processors Gabriel Hsiuwei Loh 2002 The vast computational resources in billion-transistor VLSI microchips can continue to be used to build aggressively clocked uniprocessors for extracting large amounts of instruction level parallelism. This dissertation addresses the problems of implementing wide issue, out-of-order execution, supersca...

متن کامل

Effects of Multithreading on Cache Performance

ÐAs the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The...

متن کامل

Scaling Instruction Window

Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly second-level cache misses) and explore more instruction level parallelism (ILP); on the one hand, a larger instruction window can buffer larger number of instructions and find more independent instructions to execute, on the other hand, simply scaling instruction window as a unified and single u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

CS – 2002 – 03 A Large , Fast Instruction Window for Tolerating Cache

نویسندگان

چکیده

منابع مشابه

Checkpoint Processing and Recovery: An Efficient, Scalable Alternative to Reorder Buffers

Practical Precise Evaluation of Cache Effects on Low Level Embedded Vliw Computing

Microarchitecture for Billion-Transistor VLSI Superscalar Processors

Effects of Multithreading on Cache Performance

Scaling Instruction Window

عنوان ژورنال:

اشتراک گذاری